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ABSTRACT 

We  show  that  the  inference  rules  for  multivalued  dependencies  can- 
not be  extended  to  a  complete  set  of  inference  rules  for  embedded  mul- 
tivalued dependencies.  A  new  type  of  dependencies,  called  subset  depen- 
dencies, is  introduced.  Subset  dependencies  are  a  generalization  of 
embedded  multivalued  dependencies.  We  give  a  set  of  inference  rules  for 
subset  dependencies  and  investigate  their  properties. 


CR  categories:  4.33 

Key  words  and  phrases:   multivalued  dependency,   embedded  multivalued 
dependency,  subset  dependency,  inference  rule,  relational  database. 
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_1.  Introduction 

The  relational  database  model  [Cod]  uses  dependencies  as  a  semantic 
tool  for  expressing  properties  of  the  data.  Functional  [Arm, Cod]  and 
multivalued  dependencies  [BFH,Fa 1, Zan]  are  the  most  common  types  of 
dependencies,  and  they  have  been  investigated  thoroughly  (e.g., 
[Bel,BB,Bil,Bi2,Fa2,HIT,Mak,Men,Nic,Sag,SaF]). 

A  complete  utilization  of  multivalued  dependencies  requires  that  we 
deal  also  with  embedded  multivalued  dependencies,  i.e.,  those  mul- 
tivalued dependencies  that  hold  in  a  projection  of  a  relation  but  not 
necessarily  in  the  relation  itself.  In  contrast  to  functional  and  mul- 
tivalued dependencies,  the  properties  of  embedded  multivalued  dependen- 
cies are  substantially  unknown.  Attempts  have  been  made  to  extend  the 
inference  rules  of  [BFH]  for  multivalued  dependencies  to  a  complete  set 
of  inference  rules  for  embedded  multivalued  dependencies  [TK1,TK2]. 
However,  in  this  paper  we  show  that  no  such  extension  exists.  The  proof 
is  carried  out  by  showing  that  for  every  positive  integer  n,  there  is  a 
set  of  n  embedded  multivalued  dependencies  £  that  implies  another  embed- 
ded multivalued  dependency  a,  but  the  only  embedded  multivalued  depen- 
dencies implied  by  any  subset  of  Z  are  those  obtained  by  augmentation 
and  projection. 

We  also  introduce  a  new  type  of  dependencies,  called  subset 
dependencies,  that  is  a  generalization  of  embedded  multivalued  dependen- 
cies. A  set  of  inference  rules  for  subset  dependencies  is  presented. 
This  set  of  rules  is  not  known  to  be  complete.   However,  it  is  superior 
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to  the  rules  of  [BFH]  in  the  following  sense.  We  show  the  existence  of 

a   subset  of  embedded  multivalued  dependencies   for  which  one  cannot 

obtain  a  complete  set  of  inference  rules  by  extending  the  rules  of 

[BFH],   and  yet  our  rules  are  complete  for  this  subset.   Our  rules  also 

imply  the  rules  of  [BFH]  for  multivalued  dependencies,   and  the  known 
rules  for  embedded  multivalued  dependencies  [ABU,Fal,TKl] . 


2*        Basic  Definitions  and  Results 

2^_1 .  The  Relational  Model  for  Databases 

The  relational  model  for  databases  assumes  that  the  data  is  stored 
in  tables  called  relations.  The  columns  of  a  table  correspond  to 
attributes,  and  the  rows  to  records  or  tuples.  Each  attribute  has  an 
associated  domain  of  values.  It  is  convenient  to  regard  a  tuple  as  a 
mapping  from  the  attributes  to  thier  domains,  since  no  canonical  order- 
ing of  the  attributes  is  needed  in  this  way.  A  relation  scheme  is  a  set 
of  attributes  labeling  the  columns  of  a  table.  We  often  use  the  rela- 
tion scheme  itself  as  the  name  of  the  table.  A  relation  can  be  viewed 
as  the  "current  value"  of  a  relation  scheme. 

Suppose  that  r  is  a  relation  defined  on  a  relation  scheme  X.  Let  \x 
be  a  tuple  of  r  and  A  an  attribute  in  X.  The  tuple  p  maps  the  attribute 
A  to  M(A),  and  u(A)  is  called  the  A-value  of  u.  If  Y  is  a  subset  of  X, 
then  y(Y)  is  a  tuple  defined  only  on  the  attributes  of  Y;  the  tuple  y(Y) 
maps  each  attribute  A  of  Y  to  y(A).  We  call  M(Y)  a  Y-value  in  r  and 
usually  ^enote   it  by  y.   If  tuples  y  and  v  agree  on  all  the  attributes 
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of  the  set  X,  then  we  write  u(X)  -  v(X).  The  projection  of  the  relation 
r  onto  Y  is  obtained  by  removing  coloumns  of  r  not  corresponding  to 
attributes  of  Y  and  identifying  common  tuples,  i.e., 

r(X)  -  <u(Y)  |  pis  a  tuple  of  r} 

We  use  the  letters  A,B,C,...  to  denote  attributes,  and  the  letters 
...,X,Y,Z  to  denote  sets  of  attributes.  A  string  of  attributes  (e.g., 
ABCD)  denotes  the  set  containing  these  attributes,  and  the  union  of  two 
sets  X  and  Y  is  written  as  XY. 


_2._2.  Dependencies 

In  many  cases  the  data  must  satisfy  certain  constraints.  Func- 
tional [Arm, Cod]  and  multivalued  [BFH,Fal,Zan]  dependencies  are  the  most 
common  types  of  dependencies.  In  this  paper  we  consider  only  mul- 
tivalued dependencies.  A  multivalued  dependency  (abbr.  MVD)  is  a  state- 
ment of  the  form  X  *♦  Y,  where  both  X  and  Y  are  sets  of  attributes. 
Suppose  that  r  is  a  relation  on  a  relation  scheme  U.  Let  Z  be  the  set 
of  all  the  attributes  in  U  that  are  neither  in  X  nor  in  Y.  The  relation 
r  satisfies  the  MVD  X  ++  Y  (or  X  ♦♦  Y  holds  in  r)  if  and  only  if  for  all 
tuples  v.  and  M?  in  r,  if  P.(X)  ■  \i  (X) ,  then  there  are  tuples  M^  and  M, 
in  r  such  that 

(i)   u3(X)  =  yx(X),  u3(Y)  =  ^(Y),  and  lyz)  -  u2(Z) 
(ii)  u4(X)  =  M2(X),  u4(Y)  =  M2(Y),  and  U^Z)    =  u^Z). 
In  other  words,  X  ++   Y  means  that  the  set  of  Y-values  associated  with  a 
particular  X-value  must  be  independent  of  the  values  of  the  rest  of  the 
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attributes. 

If  we  consider  relations  over  the  relation  scheme  U,  then  the  MVD 
X  ■*■»-  Y  is  written  as  X  ♦♦  Y|Z,  where  Z  =  U  -  X  -  Y.  Suppose  that  STV  is 
a  proper  subset  of  U.  The  statement  S  "►-*■  T|V  is  called  an  embedded 
multivalued  dependency  (abbr.  EMVD).  The  EMVD  S  ♦♦  T|V  holds  in  a  rela- 
tion r  over  U  if  the  MVD  S  -►-»■  T|V  holds  in  the  relation  r(STV),  i.e.,  in 
the  projection  of  r  onto  STV.  The  EMVD  S  •*■-►  T|V  is  said  to  be  defined 
on  STV.  Note  that  an  MVD  is  also  an  EMVD,  but  the  converse  is  not 
necessarily  true. 

A  dependency  a  is  a  consequence  of  a  set  of  dependencies  E  (or  E 
implies  a)  if  for  all  relations  r,  o  holds  in  r  if  all  the  dependencies 
of  E  hold  in  r.  If  E  is  a  set  of  MVD's  (and/or  functional  dependencies) 
and  o  is  an  MVD  (or  a  functional  dependency),  then  there  are  inference 
rules  that  can  be  used  to  infer  a  from  E  if  and  only  if  o  is  a  conse- 
quence of  E  (i.e.,  the  rules  are  complete)  [Arm,BFH].  In  this  case 
there  are  also  efficient  algorithms  for  deciding  whether  E  implies  a 
[Bel, BB, HIT, Sag].  The  properties  of  EMVD,  however,  are  substantially 
unknown.  (For  example,  the  problem  of  deciding  whether  an  EMVD  a  is 
implied  by  a  set  of  EMVD's  E  is  not  even  known  to  be  decidable.) 

A  dependency  o  is  trivial  if  for  all  relations  r,  a  holds  in  r.  A 
trivial  dependency  is  implied  by  any  other  set  of  dependencies.  An  EMVD 
X  +♦  Y|Z  is  trivial  if  either  XY  or  XZ  is  equal  to  XYZ. 

The  following  is  a  complete  set  of  inference  rules  for  MVD's  (U  is 
assumed  to  be  the  set  of  all  the  attributes) . 
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MVDO   (Complementation): 

Let  X,  Y,  and  Z  be  sets  of  attributes  such  that  their  union 
is  U  and  Y  ft  Z  C   X.   Then  X  ♦+  T  if  and  only  if  X  ♦+  Z. 
MVD1   (Reflexivity): 

If  Y  C   X,  then  X  ■►+  Y. 
MVD2   (Augmentation): 

If  V  C  W  and  X  ++   Y,  then  XW  ++  YV. 
MVD3   (Transitivity): 

If  X  »  Y  and  Y  ■»•■»■  Z,  then  X  +♦  Z  -  Y. 
These  rules  can  be  used  to  infer  EMVD's  from  other  EMVD's  as  long  as  all 
the  EMVD's  involved  are  defined  on  the  same  set  of  attributes.  (For 
example,  an  EMVD  X  •*-*■  Y|Z  can  be  augmented  with  a  set  W  only  if  W  is 
contained  in  XYZ.)  The  following  is  a  rule  for  inferring  EMVD's 
[ABU,  Fa  1]. 

EMVD1   (Projection): 

If  X  -M-  Y|Z,  Y'  C  Y  and  Z'  C_  Z,  then  X  ♦+  Y'|Z' 


3^.    Subset  and  Equivalence  Dependencies 


Let  Z  and  X  be  sets  of  attributes  and  let  x  be  an  X-value   in  a 
relation  r.    Z  (x)   is   the  set  of  all  Z-values  associated  with  the  X- 
value  x,  i.e., 
Z  (x)  =  {z  |  there  is  a  tuple  u  in  r  such  that  u(Z)  =  z  and  p(X)  -  x} 

Proposition  1:  Z  (xy)  is  a  subset  of  Z  (x)  for  all  X-values  x  and 
Y-values  y  in  r. 


-  7  - 


Proof:  If  xy  is  not  an  XY-value  in  r,  then  Z  (xy)   is  empty  and, 

hence,   Z  (xy)  jC  Z  (x) .    Suppose   that  xy  is  an  XY-value  in  r.   For  all 

tuples  y  of  r,  if  u(XY)  =  xy  then  y(X)  =  x  and,  by  definition,   Z  (xy) 
must  be  a  subset  of  Z  (x) .    [] 

Lemma  2:  If  Z  (x)  £  Z  (y)  for  all  XY-values  xy  in  a  relation  r,  and 

Z  (y)  C  Z  (w)   for  all  YW-values  yw  in  r,  then  Z  (x)  C  Z  (w)  for  all  XW- 
r    —  r  r    —  r 

values  xw  in  r . 


Proof:  Suppose  that  P  is  a  tuple  in  r  such  that  P(XW)  =  xw.  Let 
p(Y)  =  y.  Since  xy  is  an  XY-value  in  r,  Z  (x)  C  Z  (y) .  Similarly, 
Z  (y)  C   Z  (w).   Hence,  Z  (x)  £  Z  (w)  for  all  XW-values  xw  in  r.    [] 

Lemma  3:  The  EMVD  X  ■*-■*■  Y|Z  holds  in  a  relation  r  if  and  only  if 
Z  (x)  =  Z  (xy)  for  all  XY-values  xy  in  r. 

Proof:  [Fal]  Only  if.  By  Proposition  1,  Z  (xy)  C  Z  (x)  and,  there- 
fore, it  suffices  to  prove  that  Z  (x)  £  Z  (xy).  Let  z  e  Z  (x) ,  i.e., 
there  is  a  tuple  y.  in  r  such  that  y.(X)  =  x  and  y. (Z)  =  z.  Let 
u.(Y)  =  y' .  Since  xy  is  an  XY-value  in  r,  there  is  a  tuple  U-  in  r  such 
that  y2(X)  =  x  and  M2(Y)  =  y.  Let  U2(Z)  =  z'.  By  definition  of  EMVD's, 
there  is  a  tuple  y_  in  r  such  that  VoW  =  x,  M3(Y)  =  y,  and  y.(Z)  =  z. 
Hence,  z  e  Z  (xy) . 

If .   Let  y.  and  u  be  tuples  of  r  such  that 

(1)  y1(X)  =  x,  yx(Y)  =  y,  and  ^(Z)  =  z 

(2)  u2(X)  =  x,  u2(Y)  =  y',  and  M^Z)    =   z' 

Obviously,  both  z  and  z'  are  in  Z  (x) .   Since  Z  (x)  jC  Z  (xy),  z  and   z' 


-  8  - 


are  also  in  Z  (xy) .   Therefore,  there  must  be  a  tuple  u  in  r  such  that 


U3(X)  -  x,  u3(Y)  -  y,  and  u3(Z)  -  z' 


Similarly,  since  Z  (x)  £  Z  (xy'),  there  is  a  tuple  u,  in  r  such  that 

P4(X)  «  x,  u4(Y)  -  y',  and  y^Z)  -  z 
Thus,  X  ++  Y  |  Z  holds  in  r.    [] 


A  subset  dependency  (abbr.  SD)  is  a  statement  of  the  form 
Z(X)  C  Z(Y),  where  X,  Y,  and  Z  are  sets  of  attributes  and  both  X  and  Y 
are  disjoint  from  Z.  The  ZSD  Z(X)  C  Z(Y)  holds  in  a  relation  r  if  and 
only  if  Z  (x)  £  Z  (y) ,  for  all  XY-values  xy  in  r. 

An  equivalence  dependency  (abbr.  ED)  is  a  statement  of  the  form 
Z(X)  ■  Z(Y),  where  X,  Y,  and  Z  are  sets  of  attributes  and  both  X  and  Y 
are  disjoint  from  Z.  The  dependency  Z(X)  ■  Z(Y)  holds  in  a  relation  r 
if  and  only  if  Z  (x)  =  Z  (y)  for  all  XY-values  xy  in  r. 

• 

Example  1:  Consider  the  relation  of  Figure  1.  The  ZED  Z(X)  =  Z(Y) 
holds  in  the  above  relation.  Note  that  the  EMVD  X  +♦  Y|Z  does  not  hold 
in  this  relation.    [] 

• 

Proposition  4:  Z(X)  =  Z(Y)  holds  in  a  relation  r,  if  and  only  if 
Z(X)  C  Z(Y)  and  Z(Y)  C  Z(X)  hold  in  r. 

Proof:  Immediate  from  the  definitions.    [] 

Lemma  5:  If  W  C   V,  then  Z(V)  C  Z(W)  is  a  trivial  SD. 

Proof:  Let  wv  be  a  WV-value  in  a  relation  r.   By  proposition   1, 

Z  (v)  C  Z  (w).   Thus,  Z(V)  C  Z(W)  holds  in  r.    [] 
r       r 
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Figure  1 


Lemma  6:  If  Z(X)  C  Z(Y)  and  Z(Y)  C  Z(W)  hold  in  a  relation  r,   then 
Z(X)  C  Z(W)  also  holds  in  r. 

Proof:  Follows  directly  from  the  definitions  and  Lemma  2.    [] 

Lemma  7:  The  EMVD  X  ♦+  Y|Z  holds  in  a  relation  r  if  and  only  if 
Z(X)  =  Z(XY)  holds  in  the  relation  r. 

Proof:  Follows  from  the  definitions  and  Lemma  3.    [] 

Corollary  8:  The  EMVD  X  +■*■   Y|Z  holds  in  a  relation  r  if  and  only  if 
Z(X)  C  Z(XY)  holds  in  the  realtion  r. 

Proof:  Follows  from  Lemma  7,  Lemma  5,  and  Proposition  4.    [] 

Corollary  8  implies  that  a  set  of  EMVD's  is  equivalent  to  a  set  of 
SD's.   Consequently,  from  now  on  EMVD's  are  treated  as  SD's. 
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An  SD  of  the  form  Z(X)  C  Z(Y),  where  Z  is  a  fixed  set  of  attri- 
butes, is  called  a  Z  subset  dependency  (abbr.  ZSD).  In  this  section  we 
consider  a  set  I  of  ZSD's  (i.e.,  a  set  of  SD's  with  a  fixed  Z) .  In  par- 
ticular, note  that  a  set  of  EMVD's  of  the  form  X  »  Y|Z,  for  a  fixed  Z, 
is  a  set  of  ZSD's.   The  kernel  of  I,  written  KER(I),  is  the  set: 

{X  |  there  is  a  Y  such  that  either  Z(X)  C  Z(Y)  or  Z(Y)  C  Z(X)  is  in  £} 
A  Z-graph  for  S  is  a  directed  graph  defined  as  follows.  The  nodes  of  a 
Z-graph  correspond  to  sets  of  attributes  that  are  disjoint  from  Z.  A 
node  corresponding  to  a  set  of  attributes  X  is  denoted  by  [X] .  A  Z- 
graph  has  a  node  for  each  set  in  KER(E)  and  possibly  additional  nodes 
that  correspond  to  other  sets.  The  following  rules  imply  all  the 
(directed)  edges  of  a  Z-graph  G„. 

Rule  1:  If  [X]  and  [Y]  are  nodes  of  G  and  X  C   Y,  then  there  is   an 

edge  from  [X]  to  [Y]  . 

Rule  2;  For  each  ZSD  Z(X)  C  Z(Y)  in  E,  there  is  an  edge  in  G   from 

[Y]  to  [X]. 
The  minimal  Z-graph  for  E  is  the   Z-graph  containing  only  nodes   that 
correspond  to  sets  of  KER(E).   By  reflexivity  (Lemma  5)  and  transitivity 
(Lemma  6)  of  SD's,  we  obtain  the  following  lemma. 

Lemma  9 :  If  there  is  a  directed  path  in  a  Z-graph  G„  from  {Y]  to 
[X],  then  Z(X)  C  Z(Y)  is  a  consequence  of  E. 

We  now  prove  that  the  converse  of  Lemma  9  is  also  true. 
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Lemma  10:  If  there  is  no  path  in  a  Z-graph  G„  from  a  node  [Y]  to  a 
node  [X] ,  then  there  is  a  relation  r  in  which  all  the  dependencies  of  Z 
hold,  but  Z(X)  C  Z(Y)  fails. 

Proof;  We  construct  a  relation  r  over  the  domain  {0,1}  as  follows. 
Consider  the  set 

CON(X)  =  <W  |  there  is  a  path  from  [W]  to  [X]  in  G^,} 
For  all  sets  W  in  CON(X),  the  relation  r  has  a  pair  of  tuples  as  fol- 
lows. Both  tuples  map  all  the  attributes  of  W  to  0,  and  all  the  attri- 
butes that  are  neither  in  W  nor  Z  to  1.  One  tuple  in  the  pair  maps  all 
the  attributes  of  Z  to  0,  and  the  other  tuple  maps  all  the  attributes  of 
Z  to  1.  The  relation  r  has  one  more  tuple,  denoted  by  u,  that  maps  all 
the  attributes  to  0. 

Conventionally,  the  X-value  of  a  set  of  attributes  X  consisting 
only  of  0's  is  denoted  by  x_;  the  X-value  consisting  only  of  l's  is 
denoted  by  x. . 

Claim  1 ;  Let  V  be  disjoint  from  Z.  If  v  is  a  V-value  occuring  in  a 
tuple  of  r  other  than  u,  then  Z  (v)  »  {z.,z_}.  If  v  occurs  in  u  (i.e., 
v=vn),  then  Z  (v)  contains  z_. 

Claim  1  follows  immediately  from  the  construction  of  r. 

Let  V  be  disjoint  from  Z.   Suppose  that  Z  (vQ)  «  {z.,z0>.    (Note 

that  v_  occurs  in  u.)  Thus,  there  is  a  tuple  v  in  r  corresponding  to  a 

set  W  of  CON(X)  with  a  V-value  vQ  and  a  Z-value  z..   It  follows  that  V 

must  be  a  subset  of  W,  because  W  contains  all  the  attributes  (except 
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those  in  Z)  that  are  mapped  to  0  by  v.   But  in  this  case  there  is  an 

edge  from   [V]   to   [W]   and,  hence,   there  is  a  path  from  [V]  to  [X] 

(because  there  is  a  path  from  [W]  to  [X]).   Thus,   we  have  proved  the 
following  claim. 


Claim  2:  Let  V  be  disjoint  from  Z.   If  Z  (vQ)  -  {z0,z.>,  then  V  is 
in  CON(X)  (i.e.,  there  is  a  path  from  [V]  to  [X]). 


We  now  show  that  all  the  ZSD's  of  I  hold  in  r  but  Z(X)  C  Z(Y)  fails 

in  r.   In  proof,  Z  (x~)  =  {z.,z.},  because  X  is  in  CON(X).  Since  there 

is  no  path  from  [Y]  to  [X],  Z  (yQ)  -  {zQ}   by  Claim  2.   Thus  Z(X)  C  Z(Y) 
fails  in  r. 

Let  Z(W)  C  Z(V)  be  any  ZSD  in  £.  By  Claim  1,  in  order  to  prove 
that  Z(W)  C  Z(V),  it  is  sufficient  to  show  that  Z  (w  )  £  Z  (v.)  for  the 
WV-value  w.v-  occuring  in  u.  By  Claim  1,  if  Z  (w_)  =  {z.}  we  are  done. 
So  suppose  that  Z  (w_)  =  {z_,z.}.  By  Claim  2  ,  there  is  a  path  from  [W] 
to  [X]  and,  hence,  there  is  a  path  from  [V]  to  [X]  (because  Z(W)  C  Z(V) 
implies  an  edge  from  [V]  to  [W] ) .  Hence,  Z  (v_)  ■  {z_,z.}.  This  com- 
pletes the  proof.    [] 


Lemma  9  and  Lemma  10  provide  a  method  for  deciding  whether  a  set  of 
ZSD's  E  implies  another  ZSD  Z(X)  C  Z(Y).  In  order  to  do  so,  construct  a 
Z-graph  G  with  nodes  corresponding  to  X  and  Y,  and  check  whether  there 
is  a  path  from  [Y]  to  [X] . 

Theorem  11:  Testing  whether  a  ZSD  Z(X)  C  Z(Y)  is  a  consequence  of  a 

2 
3et   of   ZSD's   Z  can  be  done  in  0(n  )  time,  where  n  is  the  size  of  the 
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input 


Proof;  Assuming  that  the  attributes  in  the  input  are  represented  by 

the  numbers  l,...,k,  a  Z-graph  containing  nodes  for  X  and  Y  can  be  con- 

2 
structed  in  0(n  )  time.   Testing  whether  there  is  a  path  from  [Y]  to  [X] 

requires  only  linear  time  (in  the  size  of  the  graph).    [] 


A  Z   embedded  multivalued  dependency  (abbr.  Z-EMVD)  is  an  EMVD  of 
the  form  X  +•*■  Y|Z,  where  Z  is  a  fixed  set  of  attributes. 

Corollary  12:  Testing  whether  a  Z-EMVD  X  >->■  Y|Z  is  a  consequence  of 

2 
a   set  of  Z-EMVD's  E  can  be  done  in  0(n  )  time,  where  n  is  the  size  of 

the  input. 


5.  ZSD's  and  EMVD's 


In  this  section  we  investigate  the  EMVD's  implied  by  a  set  of  ZSD's 

E.   Let  MG  be  the  minimal  Z-graph  for  E.   The  Z-EMVD  cover  of  E,  writ- 

r 
ten  Z-EMVD  (E),  is  the  set 

{X+vyJZ  |  there  is  a  path  from  [XY]  to  [X]  in  MG^} 
We  will  show  that  an  EMVD  T  is  implied  by  E  only  if  there  is  a  Z-EMVD   a 
in  Z-EMVD  (E)  such  that  t  is  obtained  from  a  by  augmentation  and  projec- 
tion. 


Lemma  13:  If  a  Z-EMVD  X  ■*-*■   Y|Z  is  a  consequence  of  a  set  of  ZSD's 

Q 

E,   then  X  ++  Y|Z  can  be  obtained  from  a  Z-EMVD  in  Z-EMVD  (E)  by  augmen- 
tation an'  projection. 
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Proof;  If  both  X  and  XY  are  in  KER(Z),  then  X  ♦♦  Y|Z  is  in  Z- 
EMVD  (Z)  and  we  are  done.  Assume  that  neither  X  nor  XY  is  in  KER(Z) 
(the  other  two  cases  in  which  either  X  or  XY  is  in  KER(Z)  are  proved 
similarly).  Let  G„  be  a  Z -graph  in  which  all  the  nodes  correspond  to 
members  of  the  set  KER(Z)  U  {[X],[XY]}.  Since  X  -►->-  Y|Z  is  a  consequence 
of  Z,  there  is  a  path  in  G  from  [XY]  to  [X] .  Let  the  first  edge  in 
this  path  be  from  [XY]  to  [S],  and  the  last  edge  be  from  [T]  to  [X].  An 
edge  from  [XY]  to  [S]  can  exist  only  if  XY  C  S.  Similarly,  T  C  X  and, 
hence,  T  C  S.  Let  S  be  written  as  TS',  where  S'  is  disjoint  from  T. 
Thus,  Z-EMVD  (Z)  contains  the  Z-EMVD  T  ♦+  S'|Z.  It  is  easy  to  show  that 
X  ++  Y|Z  follows  from  T  ++   S'|Z  by  augmentation  and  projection.    [] 

Lemma  14:  If  W  ++  V|Y  is  a  nontrivial  EMVD  implied  by  a  set  Z  of 
ZSD's,  then  either  VCZor  YCZ.  (It  is  assumed  that  W,  V,  and  Y  are 
pairwise  disjoint.) 


Proof:  Construct  a  relation  r  over  {0,1}  with  two  tuples  that  agree 
exactly  on  the  atributes  of  Z  and  W.  Let  z  be  the  Z-value  of  the  two 
tuples  in  r.  Obviously,  for  every  X-value  x,  Z  (x)  =  {z}.  Thus  all  the 
ZSD's  of  Z  hold  in  r  and,  hence,  W  ■*■■*■  V|W  holds  in  r.  By  Lemma  3  in 
[SaF],  either  VCZorYCZ.    [] 

Suppose  that  a  is  a  nontrivial  EMVD  implied  by  a  set   of  ZSD's   Z. 

By  Lemma   13,   a  can  be  written  as  W  +■*  V'|Z',  where  Z'  C   Z.   (It  is 

assumed  that  W,  V,  and  Z'  are  pairwise  disjoint.)   We  now  prove  the 
following  lemma. 
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Lemma  15:  There  exists  a  Z-EMVD  W  ■*■*  V|Z  implied  by  E  such  that 
W'  ■*-*  V'|Z'  can  be  obtained  from  W  +■*■  V|Z  by  augmentation  and  projec- 
tion. 

Proof:  We  use  the  same  method  as  in  the  proof  of  Lemma  10.  In  that 
proof  we  built  a  relation  r  having  two  Z-values,  z.  and  z.,  that 
disagreed  on  all  the  columns  of  Z.  It  is  sufficient,  however,  that  zn 
and  z.  would  not  be  the  same.  Thus  z.  is  replaced  with  z,  where  z  has 
0's  exactly  in  the  columns  of  W  ft  Z  and  l's  in  all  the  other  columns  of 
Z.  Since  W  ++  V'|Z'  is  a  nontrivial  EMVD,  Z'  -  W  is  nonempty  (i.e., 
some  columns  of  z  are  indeed  1  and  z  is  different  from  z_). 

Let  W  =  W  -  Z,  and  let  G„  be  a  Z-graph  for  £  containing  the  nodes 
[W]  and  [WV'J.  Construct  a  relation  r  as  in  the  proof  of  Lemma  10  using 
the  Z-values  z_  and  z  (instead  of  z.),  and  the  set  CON(W).  Recall  that 
y  is  the  tuple  of  r  that  maps  all  the  attributes  to  0.  Since  W  is  in 
CON(W),  the  relation  r  has  a  tuple  u  such  that  vj  maps  all  the  attributes 
of  W  to  0,  all  the  attributes  of  Z  to  z,  and  all  the  other  attributes  to 
1.   Note  that  the  tuples  vi  and  u  agree  exactly  on  the  columns  of  W. 

All  the  ZSD's  of  I  hold  in  r  and,  hence,  W  **  V'|Z'  also  holds  in 
r.  Therefore,  there  is  a  tuple  t  in  r  such  that  t(W')  =  u(W'), 
t(V')  =  u(V'),  and  t(Z')  =  o(Z').  The  Z-value  of  t  must  be  z,  because  u 
maps  some  attributes  of  Z'  to  1.  Therefore,  t  and  u  agree  on  all 
columns  of  Z.  Since  they  should  disagree  on  all  the  columns  of  V,  it 
follows  that  V  is  disjoint  from  Z. 
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By  the  construction  of  r,  WV'  must  be  in  CON(W),  because  WV'  con- 
tains all  the  columns  (except  those  in  Z)  in  which  t  has  O's.  Hence, 
there  is  a  path  in  G  from  [WV']  to  [W] .  Therefore,  W  ^  V'|Z  is  a 
consequence  of  E.  Obviously,  W  ♦+  V'|Z'  can  be  obtained  from  W  +■♦  V  |Z 
by  augmentation  and  projection.    [] 


6^.  The  Nonextendibility  of  the  MVP  Inference  Rules  to  EMVD's 

In  this  section  we  show  that  for  any  positive  integer  n,  one  can 
find  a  set  Z  of  n  EMVD's  that  implies  another  EMVD  a,  but  any  n-1  EMVD's 
of  £  imply  only  those  EMVD's  that  can  be  obtained  by  projection  and  aug- 
mentation. This  result  indicates  that  the  inference  rules  of  [BFH]  for 
MVD's  cannot  be  extended  in  any  meaningful  way  to  a  set  of  inference 
rules  for  EMVD's. 


Given  a  positive  integer  n,  let  XA,X-,  ...,X   ,,Z  be  pairwise  dis- 

U  z  n— l 

joint  sets  of  attributes.   Let  Z  consists  of  the  following  Z-EMVD's. 

xo  **  xi|z 
Xj  —  x2iz 


Xn-2  "  Xn-I|Z 

X„-l  "  xo|z 
That  is,  E  contains  the  Z-EMVD  X  ->•>  X1+ilz  for  a11  0<i<n-2,  and  the   Z- 

EMVD  X  ++   X,|Z.   It  is  convenient  to  assume  that  addition  and  subtrac- 
n     1 

tion  of  indices  is  done  modulo  n.   For  example,  X  is  Xni      and  X  .   is 

r  n      0         -1 
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X   .. 
n-1 

Lemma  16;  X.  ■*■■*■   X   . |Z  is  a  consequence  of  Z. 

Proof;  To  prove  Lemma  16,  construct  the  minimal  Z-graph  MG  for  X. 
The  graph  MG  has  the  following  nodes; 

(1)  [X±]  for  0<i<n-l,  and 

(2)  [x±x1+1]    for   0<i<n-l. 

The  edges  of  MG_  can  be  classified  in  the  following  groups; 

(1)  for  0<i<n-l,  an  edge  from  [X  ]  to  [XX],  and 

(2)  for  0<i<n-l,  an  edge  from  [X  ]  to  [XX]. 

The  edges  in  the  above  groups  are  implied  by   Rule   1.    The   following 
group  of  edges  is  implied  by  Rule  2. 

(3)  for  0<i<n-l,  an  edge  from  t^^.J  to  [X  ]  . 

Figure  2  describes  the  graph  G„  for  n=4.   The  edges   in  group   (3) 

are  denoted  by  broken  lines.   It  is  easy  to  see  that  there  is  a  path 

from  [X   .X,J  to  [X_] .   Hence,  Xrt  -»•->■  X   ,|Z  is  a  consequence  of  E. 
n-1  0       0  0     n-1 

Lemma  17;  Let  E'  be  a  set  of  n-1  dependencies  from  E.  If  a'  is 
implied  by  £',  then  there  is  a  Z-EMVD  a  in  V  such  that  o"  is  obtained 
from  a  by  augmentation  and  projection. 

Proof;  Consider  the  graph  MG  .  Obviously,  a  path  in  MG  that 
corresponds  to  a  Z-EMVD  implied  by  £  must  start  in  a  node  [XX]  (for 
some  0<i<n-l)  and  terminate  In  either  [X  ]  or  [^.il*  For  all  i 
(0<i<n-l),  there  is  an  edge  from  [x±x1+i]  to  [X  ]  ,  because  X  -»•>  X  f Z 
is  in  Z.      It  is  easy  to  see  that  there  is  a  path  from  [XX]  to  IX,,] 


Figure   2 
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for  all  i.   That  is,  X    *-*■  X  |Z  is  implied  by  Z    (0<i<n-l). 

Claim  3:  For  all  0<i<n-l,  every  path  from  [X.X  .]  to  [X  ]  uses 
all  the  edges  in  group  (3)  (i.e.,  all  the  edges  implied  by  the  EMVD's  in 
Z). 


Proof:  Suppose  not.  Among  all  the  paths  from  [XX.]  to  [X  ,. ] 
that  do  not  use  all  the  edges  in  group  (3),  consider  a  path  p  that  has  a 
minimum  number  of  edges.  For  all  nodes  [X  ]  (0<i<n-l),  there  is  only 
one  edge  directed  to  [X  ]  and  this  edge  is  in  group  (3).  Since  p  does 
not  use  all  the  edges  in  group  (3),  there  is  a  j  (0<j<n-l)  such  that  p 
visits  [X  . ]  but  does  not  visit  [X  ]  (because  [X  . ]  is  in  the  path, 
and  some  [X,  ]  is  not  in  the  path)  . 

Case  1:  Suppose  that  j=i.  The  only  edge  out  of  [X^^.J  is 
directed  toward  [X  ] .  If  [X  ]  is  not  visited,  then  p  cannot  be  a  path 
from  [X±Xi+1]  to  [X±+1]. 

Case  2}    j*i*   Note  that  p  has  a  minimum  number  of  edges  and,  hence, 

no  node  is  visited  more  than  once.   Since  j*i,  [X.,,]  cannot  be  the  last 

node  in  p.   There  are  two  edges  out  of  [X   . ] ;  one  edge  is  directed   to 

[X.X   ]   and   the  other  edge  is  directed  to  [X...X. ,.].   The  only  edge 
J  J+1  j+1  j+2 

out  of  [XX]  is  to  [X  ] .   The  path  p  does  not  use  this  edge,  and  so  p 

cannot   visit    [XX].    Thus,  after  entering   tXj+1]»   P  visits 

[XX].   But  the  only  way  to  move  into  [XJM]   is   from   [X...X.   ]. 
j+i  j+2  j+1  j+1  j+2 

Consequently,  [X   X   _]  is  visited  twice.   This  contradiction  completes 
the  proof  of  the  claim. 
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Now  suppose  that  X.  ♦♦  X,  |Z  (for  some  0<k<n-l)  Is  the  Z-EMVD  in  E 
but  not  in  E' .  A  Z-graph  for  E'  is  obtained  from  MG  by  removing  the 
edge  from  [X,X.  .]  to  [X,  ].  Now  no  edge  is  directed  toward  [X,  ]  and, 
therefore,  X,  +♦  X,  .  |Z  cannot  be  a  consequence  of  E' .   By  Claim  3,  none 

of  the  Z-EMVD's  X  .  ♦+  X± |Z  (0<i<n-l)  is   in  Z-EMVDC(E').    Therefore, 

r 

the  only  Z-EMVD's  in  Z-EMVD  (E')  are  those  in  E' .   Thus,  the  lemma  fol- 
lows from  Lemma  13  and  Lemma  15.    [] 


_7 .  Inference  Rules  for  Subset  Dependencies 

Although  the  inference  rules  of  [BFH]  cannot  be  extended  to  a  set 
of  inference  rules  for  EMVD's  or  even  for  Z-EMVD's,  it  might  be  possible 
to  find  a  complete  set  of  inference  rules  for  SD's.  In  fact  it  should 
be  clear  from  the  results  obtained  so  far  that  the  reflexivity  rule 
(Lemma  5)  and  the  transitivity  rule  (Lemma  6)  for  SD's  along  with  aug- 
mentation and  projection  are  complete  for  ZSD's  and,  hence,  for  Z- 
EMVD's.  In  this  section  we  give  a  set  of  inference  rules  for  SD's.  We 
have  no  proof  of  completeness  for  these  rules.  However,  these  rules 
imply  the  inference  rules  for  MVD's,  and  the  rule  of  Projection  for 
EMVD's.   Furthermore,  these  rules  are  complete  for  ZSD's  and  Z-EMVD's. 

Following  is  a  set  of  inference  rules  for  SD's.   (Note  that  Z  is  no 
longer  assumed  to  be  a  fixed  set  of  attributes.) 
SD1   (Reflexivity): 

Z(X)  C  Z(Y)  for  all  Y  C  X. 
SD2   (Augmentation): 
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•  • 

If    ZW(X)    C   ZW(Y),    then   Z(WX)    C   Z(WY). 
SD3      (Transitivity): 

•  •  • 

If  Z(X)  C  Z(Y)  and  Z(Y)  C  Z(W),  then  Z(X)  C  Z(W). 
SD4   (Complementation): 

Let  X  be  the  intersection  of  VX  and  XY. 

If  Z(VX)  C  Z(XY),  then  Y(VX)  C  Y(XZ). 
SD5   (Projection): 

If  Z(X)  C  Z(Y),  then  Z'(X)  C  Z'(Y)  for  all  Z'  C   Z. 

Lemma  18:  The  above  SD  rules  are  sound . 

Proof:  The  rule  of  Reflexivity  is  sound  by  Lemma  5,  and  the  rule  of 
Transitivity  is  sound  by  Lemma  6.  We  now  prove  that  the  other  rules  are 
also  sound. 

• 

Case  1:  (SD2  -  Augmentation)  Suppose  that  ZW(X)  C  ZW(Y)  holds  in  a 
relation  r.  We  have  to  show  that  Z(WX)  C  Z(WY)  also  holds  in  r.  Let 
wxy  be  a  WXY-value  in  r.  Suppose  that  z  is  in  Z  (wx) .  That  is,  zwx  is 
a  ZWX-value  in  r.  Therefore,  zw  is  in  ZW  (x)  and,  hence,  in  ZW  (y) . 
Thus,  zwy  is  a  ZWY -value  in  r,  and  z  is  in  Z  (wy) . 

• 

Case  2:  (SD4  -  Complementation)  Suppose  that  Z(VX)  C  Z(XY)  holds  in 

a  relation  r.   Let  vxz  be  a  VXZ -value  in  r,  and  suppose  that  y  e  Y  (vx) . 

Therefore,  yvx  is  a  YVX-value  in  r.   But  z  is  in  Z  (vx)  and,   hence,   it 

is   in  Z  (xy) .    It   follows   that   yxz  is  a  YXZ -value  in  r  and  y  is  in 
Y  (xz).   Thus,  Y(VX)  C  Y(XZ)  also  holds  in  r. 
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Case  3;  (SD  -  Projection)  It  is  easy  to  obtain  a  direct  proof  for 
this  case.  However,  note  that  the  rule  of  Projection  follows  from  the 
rules  of  Reflexivity,  Transitivity,  and  Complementation.    [] 

Lemma  1 9 :  The  rules  of  Reflexivity,  Augmentation,  Transitivity,  and 
Complementation  are  complete  for  MVD's. 

Proof;  We  will  show  that  the  SD  rules  imply  the  MVD  rules  of  Sec- 
tion 2. 

Case  1:  (MVDO  -  Complementation)  Let  V  be  the  attributes  which  are 
not  in  X  or  Y  (i.e.,  V  =  U  -  X  -  Y) ,  and  let  W  be  the  attributes  which 
are  not   in  X  or  Z.   We  have  to   show  that   V(X)  C  V(XY)   implies 

•  •  • 

W(X)  C  W(XZ).    By   SD4,   V(X)  C  V(XY)   implies   Y'(X)  C  Y' (XV) ,   where 
Y'  =  Y  -  X.   But  XV  =  XZ  and  Y'  »  W,  and  so  we  are  done. 

Case  2;  (MVD1  -  Reflexivity)  Let  Z  be  the  attributes  that  are  not 
in  X.  By  SD1  (Reflexivity),  Z(X)  C  Z(X).  By  Corollary  8,  Z(X)  C  Z(X) 
is  equivalent  to  X  ■*-*■   Y,  where  YC  X. 

Case  3:  (MVD2  -  Augmentation)  Let  Z  be  the  attributes  that  are  not 
in  X  or  Y.  Let  W  be  the  intersection  of  Z  and  W,  and  Z'  =  Z  -  W.  By 
SD2  (Augmentation),  Z'(XW)  C  Z'(XYW).  By  reflexivity,  Z'(XW)  C  Z'(XW) 
and  so  by  transitivity,  Z'(XW)  C  Z'(XYW).  Since  all  the  attributes  are 
contained  in  XYZ,  XYW  must  equal  XYW.  Thus  Z'(XW)  C  Z'(XYW).  Since 
XYWZ'  are  all  the  attributes,  by  Corollary  8,  Z'(XW)  C  Z'(XYW)  is 
equivalent  to  XW  ■*-*■   YV,  where  V  £  W. 
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Case  4;  (MVD3  -  Transitivity)  Let  V  -  U  -  X  -  Y  and  W  -  U  -  Y  -  Z. 

We  write  Z  as  ZZ  Z  ,  where  Z-Z-X-Y,  Z  -ZfiX,  and  Z  =  Z  fi  Y. 

x  y  x  y 

Let  T  ■=  U  -  X  -  Z  (note  that  T  is  the  complement  of  X  and   Z  -  Y) .    The 

MVD  X  ■*-*■  Y  is  equivalent  to  V(X)  C  V(XY) .   Since  "z  is  contained  in  V,  by 

projection, 

(1)    Z(X)  C  Z(XY) 

The  MVD  Y  ++  Z  is  equivalent  to  W(Y)  C  W(YZ)  and,  by  complementation,  it 

implies  ZZ  (Y)  C  ZZ  (YW) .   By  augmentation, 

(2)   "z(Z  Y)  C  "z(YWZ  ) 
x         x 

But  Z  Y  is  contained  in  XY  and,  by  reflexivity  and   transitivity,   (2) 


implies 


(3)    Z(XY)  C  Z(YWZ  ) 

x 


By  applying  transitivity  to  (1)  and  (3) 

(4)    Z(X)  C  ~Z(YWZ  ) 
But  YWZ  Z  contains   all   the  attributes,   and  by  complementation   (4) 
implies 

(5)    T(X)  C  T(XZ) 
Since  (5)  is  equivalent  to  the  MVD  X  +-»•  Z  -  Y,  we  are  done.    [] 
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